Goto

Collaborating Authors

 nullx 1




Bias-variance Tradeoff in Tensor Estimation

Kumar, Shivam, Xu, Haotian, Padilla, Carlos Misael Madrid, Khoo, Yuehaw, Padilla, Oscar Hernan Madrid, Wang, Daren

arXiv.org Machine Learning

We study denoising of a third-order tensor when the ground-truth tensor is not necessarily Tucker low-rank. Specifically, we observe $$ Y=X^\ast+Z\in \mathbb{R}^{p_{1} \times p_{2} \times p_{3}}, $$ where $X^\ast$ is the ground-truth tensor, and $Z$ is the noise tensor. We propose a simple variant of the higher-order tensor SVD estimator $\widetilde{X}$. We show that uniformly over all user-specified Tucker ranks $(r_{1},r_{2},r_{3})$, $$ \| \widetilde{X} - X^* \|_{ \mathrm{F}}^2 = O \Big( κ^2 \Big\{ r_{1}r_{2}r_{3}+\sum_{k=1}^{3} p_{k} r_{k} \Big\} \; + \; ξ_{(r_{1},r_{2},r_{3})}^2\Big) \quad \text{ with high probability.} $$ Here, the bias term $ξ_{(r_1,r_2,r_3)}$ corresponds to the best achievable approximation error of $X^\ast$ over the class of tensors with Tucker ranks $(r_1,r_2,r_3)$; $κ^2$ quantifies the noise level; and the variance term $κ^2 \{r_{1}r_{2}r_{3}+\sum_{k=1}^{3} p_{k} r_{k}\}$ scales with the effective number of free parameters in the estimator $\widetilde{X}$. Our analysis achieves a clean rank-adaptive bias--variance tradeoff: as we increase the ranks of estimator $\widetilde{X}$, the bias $ξ(r_{1},r_{2},r_{3})$ decreases and the variance increases. As a byproduct we also obtain a convenient bias-variance decomposition for the vanilla low-rank SVD matrix estimators.



Representative Language Generation

Peale, Charlotte, Raman, Vinod, Reingold, Omer

arXiv.org Artificial Intelligence

For decades, a central paradigm in machine learning has been prediction, where models are trained to map input data to specific output variables or cate gories. This approach encompasses tasks such as classification, regression, and foreca sting, where the goal is to accurately estimate outcomes based on given inputs. However, recent ye ars have seen a significant shift toward generative models, such as Large Language Models (LLMs) and diffusion-based ima ge generators. These models are designed not to predict specifi c outcomes, but to create new data that resembles their training sets, offering a different ap proach to machine learning tasks. This shift towards generative models necessitates the deve lopment of new theoretical frameworks to rigorously analyze their performance, capab ilities, and limitations. Recently, [KM24] proposed a theoretical framework that encapsulates the fundamental objective of generative models: after being shown a sequence of strings f rom an unknown target language (such as all valid code snippets in java), generate new, unse en strings from the target language. Informally, we say that a model satisfies generation in the limit if it achieves this goal after seeing a finite number of strings from the target langua ge.


Theoretical Guarantees for High Order Trajectory Refinement in Generative Flows

Gong, Chengyue, Li, Xiaoyu, Liang, Yingyu, Long, Jiangxuan, Shi, Zhenmei, Song, Zhao, Tian, Yu

arXiv.org Machine Learning

Flow matching has emerged as a powerful framework for generative modeling, offering computational advantages over diffusion models by leveraging deterministic Ordinary Differential Equations (ODEs) instead of stochastic dynamics. While prior work established the worst case optimality of standard flow matching under Wasserstein distances, the theoretical guarantees for higher-order flow matching - which incorporates acceleration terms to refine sample trajectories - remain unexplored. In this paper, we bridge this gap by proving that higher-order flow matching preserves worst case optimality as a distribution estimator. We derive upper bounds on the estimation error for second-order flow matching, demonstrating that the convergence rates depend polynomially on the smoothness of the target distribution (quantified via Besov spaces) and key parameters of the ODE dynamics. Our analysis employs neural network approximations with carefully controlled depth, width, and sparsity to bound acceleration errors across both small and large time intervals, ultimately unifying these results into a general worst case optimal bound for all time steps.


Distributed Learning with Discretely Observed Functional Data

Liu, Jiading, Shi, Lei

arXiv.org Machine Learning

By selecting different filter functions, spectral algorithms can generate various regularization methods to solve statistical inverse problems within the learning-from-samples framework. This paper combines distributed spectral algorithms with Sobolev kernels to tackle the functional linear regression problem. The design and mathematical analysis of the algorithms require only that the functional covariates are observed at discrete sample points. Furthermore, the hypothesis function spaces of the algorithms are the Sobolev spaces generated by the Sobolev kernels, optimizing both approximation capability and flexibility. Through the establishment of regularity conditions for the target function and functional covariate, we derive matching upper and lower bounds for the convergence of the distributed spectral algorithms in the Sobolev norm. This demonstrates that the proposed regularity conditions are reasonable and that the convergence analysis under these conditions is tight, capturing the essential characteristics of functional linear regression. The analytical techniques and estimates developed in this paper also enhance existing results in the previous literature.


Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression

Liu, Jiading, Shi, Lei

arXiv.org Artificial Intelligence

Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert space (RKHS) typically requires the target function to be contained in this kernel space. This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not necessarily reside in the underlying RKHS. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory. We develop an integral operator approach to establish sharp finite sample upper bounds for prediction with divide-and-conquer estimators under various regularity conditions of explanatory variables and target function. We also prove the asymptotic optimality of the derived rates by building the mini-max lower bounds. Finally, we consider the convergence of noiseless estimators and show that the rates can be arbitrarily fast under mild conditions.


Fast Adaptive Federated Bilevel Optimization

Huang, Feihu

arXiv.org Artificial Intelligence

Bilevel optimization is a popular hierarchical model in machine learning, and has been widely applied to many machine learning tasks such as meta learning, hyperparameter learning and policy optimization. Although many bilevel optimization algorithms recently have been developed, few adaptive algorithm focuses on the bilevel optimization under the distributed setting. It is well known that the adaptive gradient methods show superior performances on both distributed and non-distributed optimization. In the paper, thus, we propose a novel adaptive federated bilevel optimization algorithm (i.e.,AdaFBiO) to solve the distributed bilevel optimization problems, where the objective function of Upper-Level (UL) problem is possibly nonconvex, and that of Lower-Level (LL) problem is strongly convex. Specifically, our AdaFBiO algorithm builds on the momentum-based variance reduced technique and local-SGD to obtain the best known sample and communication complexities simultaneously. In particular, our AdaFBiO algorithm uses the unified adaptive matrices to flexibly incorporate various adaptive learning rates to update variables in both UL and LL problems. Moreover, we provide a convergence analysis framework for our AdaFBiO algorithm, and prove it needs the sample complexity of $\tilde{O}(\epsilon^{-3})$ with communication complexity of $\tilde{O}(\epsilon^{-2})$ to obtain an $\epsilon$-stationary point. Experimental results on federated hyper-representation learning and federated data hyper-cleaning tasks verify efficiency of our algorithm.


Game Description Logic with Integers: A GDL Numerical Extension

Mittelmann, Munyque, Perrussel, Laurent

arXiv.org Artificial Intelligence

Many problems can be viewed as games, where one or more agents try to ensure that certain objectives hold no matter t he behavior from the environment and other agents. In recent years, a num ber of logical formalisms have been proposed for specifying games amo ng which the Game Description Language (GDL) was established as the o fficial language for General Game Playing. Although numbers are rec urring in games, the description of games with numerical features in G DL requires the enumeration from all possible numeric values and the rel ation among them. Thereby, in this paper, we introduce the Game Descript ion Logic with Integers (GDLZ) to describe games with numerical varia bles, numerical parameters, as well as to perform numerical compari sons. We compare our approach with GDL and show that when describing t he same game, GDLZ is more compact.